Search CORE

373 research outputs found

Optimistic Agents are Asymptotically Optimal

Author: D. Blackwell
D. Ryabko
J. Doob
L. Orseau
M. Hutter
S.J. Russell
T. Lattimore
T. Lattimore
T. Lattimore
Publication venue
Publication date: 01/01/2012
Field of study

We use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds.Comment: 13 LaTeX page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

On the Computability of Solomonoff Induction and Knowledge-Seeking

Author: I Wood
L Orseau
L Orseau
L Orseau
L Orseau
L Orseau
M Hutter
P Gács
R Solomonoff
S Rathmanner
T Lattimore
T Lattimore
Publication venue
Publication date: 15/07/2015
Field of study

Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable. We quantify its incomputability by placing various flavors of Solomonoff's prior M in the arithmetical hierarchy. We also derive computability bounds for knowledge-seeking agents, and give a limit-computable weakly asymptotically optimal reinforcement learning agent.Comment: ALT 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Extreme State Aggregation Beyond MDPs

Author: A.L. Strehl
I. Fazekas
M. Hutter
M. Hutter
M.L. Puterman
O.-A. Maillard
P. Nguyen
P. Nguyen
P. Sunehag
R. Givan
R.S. Sutton
S.J. Russell
T. Jaksch
T. Lattimore
T. Lattimore
T. Lattimote
V. Vovk
Publication venue
Publication date: 01/01/2014
Field of study

We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp.\ MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds uniformly for all RL problems. It may also explain why RL algorithms designed for MDPs sometimes perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem

arXiv.org e-Print Archive

Crossref

The Australian National University

Investigation of compression ratio and fuel effect on combustion and PM emissions in a DISI engine

Author: Herreros Martin
Lattimore T.
Shuai S.
Xu H.
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

University of Birmingham Research Portal

Coventry University Pure Portal

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

Author: Gyorgy A
Huang R
Lattimore T
Szepesvari C
Publication venue: Neutral Information Processing Systems Foundation, Inc.
Publication date: 12/08/2016
Field of study

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are positively curved. In this paper we ask whether there are other “lucky” settings when FTL achieves sublinear, “small” regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact domain. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate of regret, while, e.g., for polyhedral domains and stochastic data it enjoys finite expected regret. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the bound available for FTL

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

Author: Gyorgy A
Huang R
Lattimore T
Szepesvari C
Publication venue: Neutral Information Processing Systems Foundation, Inc.
Publication date: 12/08/2016
Field of study

Spiral - Imperial College Digital Repository

Universal knowledge-seeking agents for stochastic environments

Author: A. Baranes
J. Schmidhuber
L. Orseau
L. Orseau
M. Li
R. Solomonoff
R. Sutton
S. Rathmanner
T. Lattimore
T. Lattimore
Y. Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

We define an optimal Bayesian knowledge-seeking agent, KL-KSA, designed for countable hypothesis classes of stochastic environments and whose goal is to gather as much information about the unknown world as possible. Although this agent works for arbitrary countable classes and priors, we focus on the especially interesting case where all stochastic computable environments are considered and the prior is based on Solomonoff’s universal prior. Among other properties, we show that KL-KSA learns the true environment in the sense that it learns to predict the consequences of actions it does not take. We show that it does not consider noise to be information and avoids taking actions leading to inescapable traps. We also present a variety of toy experiments demonstrating that KL-KSA behaves according to expectation

Crossref

HAL Descartes

The Australian National University

Bayesian reinforcement learning with exploration

Author: E. Even-Dar
I. Szita
K. Dyagilev
L. Orseau
M. Hutter
M. Hutter
M. Hutter
M. Kearns
M.G. Azar
P. Auer
P. Sunehag
S. Mannor
T. Lattimore
T. Lattimore
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case

Crossref

The Australian National University

Irus and his jovial crew : representations of beggars in Vincent Bourne and other eighteenth-century writers of Latin verse

Author: Beier
Bradner
Brome
Este
Foster
Fowler
Fürstenberg
Gilmore
Haan
Hay
Hay
Hitchcock
JOHN T. GILMORE
Lamb
Lamb
Lattimore
Mitford
Money
Money
Montagu
Pratt
Prior
Salamon
Salgādo
Warton
Wordsworth
Wordsworth
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 13/03/2013
Field of study

Alastair Fowler has written, with reference to the time of Milton, of ‘Latin's special role in a bilingual culture’, and this was still true in the early eighteenth century. The education of the elite placed great emphasis on the art of writing Latin verse and modern, as well as ancient, writers of Latin continued to be widely read. Collections of Latin verse, by individual writers such as Vincent Bourne (c. 1694–1747) or by groups such as Westminster schoolboys or bachelors of Christ Church, Oxford, could run into multiple editions, and included poems on a wide range of contemporary topics, as well as reworkings of classical themes. This paper examines a number of eighteenth-century Latin poems dealing with beggars, several of which are here translated for the first time. Particular attention is paid to the way in which the Latin poems recycled well-worn tropes about beggary which were often at variance with the experience of real-life beggars, and to how the specificities of Latin verse might heighten negative representations of beggars in a genre which, as a manifestation of elite culture, appealed to the very class which was politically and legally responsible for controlling them

Crossref

Warwick Research Archives Portal Repository

Sequential Extensions of Causal and Evidential Decision Theory

Author: A Ahmed
A Egan
A Gibbard
B Skyrms
D Lewis
J Pearl
JM Joyce
L Orseau
LJ Savage
N Bostrom
N Soares
R Nozick
RC Jeffrey
SJ Russell
T Lattimore
Publication venue
Publication date: 24/06/2015
Field of study

Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The non-dualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates between taking actions and observing their consequences. We find that evidential decision theory has two natural extensions while causal decision theory only has one.Comment: ADT 201

arXiv.org e-Print Archive

Crossref

The Australian National University